Extending Automatic Discourse Segmentation for Texts in Spanish to Catalan

نویسندگان

  • Iria da Cunha
  • Eric SanJuan
  • Juan-Manuel Torres-Moreno
  • Irene Castellón
  • Marina Lloberes
چکیده

At present, automatic discourse analysis is a relevant research topic in the field of NLP. However, discourse is one of the phenomena most difficult to process. Although discourse parsers have been already developed for several languages, this tool does not exist for Catalan. In order to implement this kind of parser, the first step is to develop a discourse segmenter. In this article we present the first discourse segmenter for texts in Catalan. This segmenter is based on Rhetorical Structure Theory (RST) for Spanish, and uses lexical and syntactic information to translate rules valid for Spanish into rules for Catalan. We have evaluated the system by using a gold standard corpus including manually segmented texts and results are promising.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Representing discourse for automatic text summarization via shallow NLP techniques

In this thesis I have addressed the problem of text summarization from a linguistic perspective. After reviewing some work in the area, I have found that many satisfactory approaches to text summarization rely on general properties of language that are reflected in the surface realization of texts. The main claim of this thesis is that some general properties of the discursive organization of t...

متن کامل

SegProso: A Praat-Based Tool for the Automatic Detection and Annotation of Prosodic Boundaries in Speech Corpora

In this paper we describe SegProso, a Praat-based tool for the automatic segmentation in prosodic units of speech corpora. It is made up of a set of Praat scripts that add several tiers, each one containing the segmentation of a different unit, to a previously existing TextGrid file including the phonetic segmentation of the associated wav file. It has been successfully used for the annotation ...

متن کامل

VIII SNRFAI 1998 1 A Statistical Spanish – Catalan Translator : A Preliminary

A system for automatic text translation between two similar languages, Spanish–Catalan, is exposed. A statistical approach has been used to develop a probability model of the translation process. We describe how the corpus has been obtained and we introduce a new algorithm for aligning parallel texts, based on Programming Dynamic. Two different approaches have been utilized for the search of th...

متن کامل

Cunha towards discourse parsing in Spanish

texts can be analysed from different perspectives. one of the most difficult phenomena to process is discourse structure (hovy 2010). in recent years, one of the main challenges in the field of natural language processing (nlp) has been discourse parsing. research on this topic has been done for several languages, such as Japanese (Sumita et al. 1992), english (marcu 2000) and portuguese (pardo...

متن کامل

Building a Discourse-Annotated Dutch Text Corpus

We are compiling a corpus of Dutch texts annotated with discourse structure and lexical cohesion, containing initially 80 texts from expository and persuasive genres. We are using this resource for corpus-based studies of discourse relations, discourse markers, cohesion, and genre differences. We are also exploring the possibilities of automatic text segmentation and semi-automatic discourse an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016